SPLASH: structural pattern localization analysis by sequential histograms

نویسنده

  • Andrea Califano
چکیده

MOTIVATION The discovery of sparse amino acid patterns that match repeatedly in a set of protein sequences is an important problem in computational biology. Statistically significant patterns, that is patterns that occur more frequently than expected, may identify regions that have been preserved by evolution and which may therefore play a key functional or structural role. Sparseness can be important because a handful of non-contiguous residues may play a key role, while others, in between, may be changed without significant loss of function or structure. Similar arguments may be applied to conserved DNA patterns. Available sparse pattern discovery algorithms are either inefficient or impose limitations on the type of patterns that can be discovered. RESULTS This paper introduces a deterministic pattern discovery algorithm, called Splash, which can find sparse amino or nucleic acid patterns matching identically or similarly in a set of protein or DNA sequences. Sparse patterns of any length, up to the size of the input sequence, can be discovered without significant loss in performances. Splash is extremely efficient and embarrassingly parallel by nature. Large databases, such as a complete genome or the non-redundant SWISS-PROT database can be processed in a few hours on a typical workstation. Alternatively, a protein family or superfamily, with low overall homology, can be analyzed to discover common functional or structural signatures. Some examples of biologically interesting motifs discovered by Splash are reported for the histone I and for the G-Protein Coupled Receptor families. Due to its efficiency, Splash can be used to systematically and exhaustively identify conserved regions in protein family sets. These can then be used to build accurate and sensitive PSSM or HMM models for sequence analysis. AVAILABILITY Splash is available to non-commercial research centers upon request, conditional on the signing of a test field agreement. CONTACT [email protected], Splash main page http://www.research.ibm.com/splash

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fecal coliform dispersal by rain splash on slopes

The movement of fecal pathogens from land to surface and ground water are of great interest because of the public health implications. Non-structural best management practices that control the timing, volume, and placement of animal manures are commonly used to limit opportunities for fecal pathogens to enter water bodies. Increased infiltration capacity, water and waste diversions, and vegetat...

متن کامل

Fuzzy multilevel graph embedding

Structural pattern recognition approaches offer the most expressive, convenient, powerful but computational expensive representations of underlying relational information. To benefit from mature, less expensive and efficient state-of-the-art machine learning models of statistical pattern recognition they must be mapped to a low-dimensional vector space. Our method of explicit graph embedding br...

متن کامل

Local gradient pattern - A novel feature representation for facial expression recognition

Many researchers adopt Local Binary Pattern for pattern analysis. However, the long histogram created by Local Binary Pattern is not suitable for large-scale facial database. This paper presents a simple facial pattern descriptor for facial expression recognition. Local pattern is computed based on local gradient flow from one side to another side through the center pixel in a 3x3 pixels region...

متن کامل

Shape Learning with Function-Described Graphs

A new method for shape learning is presented in this paper. This method incorporates abilities from both statistical and structural pattern recognition approaches to shape analysis. It borrows from statistical pattern recognition the capability of modelling sets of point coordinates, and from structural pattern recognition the ability of dealing with highly irregular patterns, such as those gen...

متن کامل

Probabilistic Deterministic Classifier Based Sequential Pattern Mining to Evaluate Structural Pattern on Chemical Bonding

ISSN: 2347-8578 www.ijcstjournal.org Page 223 Probabilistic Deterministic Classifier Based Sequential Pattern Mining to Evaluate Structural Pattern on Chemical Bonding S.Sathya , N.Rajendran [2] Research Scholar , Bharathiar University, Coimbatore Principal ,Vivekanandha arts & science college,Sankiri,Salem(dt) India ABSTRACT Evaluating the structural patterns of chemical bonding involves ident...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 16 4  شماره 

صفحات  -

تاریخ انتشار 2000